Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
نویسندگان
چکیده
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.
منابع مشابه
SEISMIC DESIGN OF DOUBLE LAYER GRIDS BY NEURAL NETWORKS
The main contribution of the present paper is to train efficient neural networks for seismic design of double layer grids subject to multiple-earthquake loading. As the seismic analysis and design of such large scale structures require high computational efforts, employing neural network techniques substantially decreases the computational burden. Square-on-square double layer grids with the va...
متن کاملEffective Approaches to Batch Parallelization for Dynamic Neural Network Architectures
We present a simple dynamic batching approach applicable to a large class of dynamic architectures that consistently yields speedups of over 10x. We provide performance bounds when the architecture is not known a priori and a stronger bound in the special case where the architecture is a predetermined balanced tree. We evaluate our approach on Johnson et al.’s recent visual question answering (...
متن کاملPREDICTION OF COMPRESSIVE STRENGTH AND DURABILITY OF HIGH PERFORMANCE CONCRETE BY ARTIFICIAL NEURAL NETWORKS
Neural networks have recently been widely used to model some of the human activities in many areas of civil engineering applications. In the present paper, artificial neural networks (ANN) for predicting compressive strength of cubes and durability of concrete containing metakaolin with fly ash and silica fume with fly ash are developed at the age of 3, 7, 28, 56 and 90 days. For building these...
متن کاملPrediction of monthly rainfall using artificial neural network mixture approach, Case Study: Torbat-e Heydariyeh
Rainfall is one of the most important elements of water cycle used in evaluating climate conditions of each region. Long-term forecast of rainfall for arid and semi-arid regions is very important for managing and planning of water resources. To forecast appropriately, accurate data regarding humidity, temperature, pressure, wind speed etc. is required.This article is analytical and its database...
متن کاملData Mining for Features Using Scale-Sensitive Gated Experts
ÐThis article introduces a new tool for exploratory data analysis and data mining called Scale-Sensitive Gated Experts (SSGE) which can partition a complex nonlinear regression surface into a set of simpler surfaces (which we call features). The set of simpler surfaces has the property that each element of the set can be efficiently modeled by a single feedforward neural network. The degree to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1701.06538 شماره
صفحات -
تاریخ انتشار 2017